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(54) Controller failure recovery in an external storage 

(57) In an external storage, an I/O process is contin- 
ued without any intervention of a user or a host system 
at failure of a controller. When a failure occurs in a con- 
troller (30), a host system 10 recognizes the failure of 
the controller (30). Before the failure is notified to the 
user and application to stop the job. the substitutive con- 
troller (40) reads the SCSI-ID possessed by an SCSI 
port (31 ) of the failed controller (30) from a shared mem- 
ory (50), registers the SCSI-ID of the SCSI port (31) to 
the SCSI port (41) associated with the substitutive con- 
troller (40), and erases by a port address resetting facil- 
ity 45 of the substitutive controller (40) the SCSI-ID 
possessed by an SCSI port (31) of the failed controller 
(30). Thanks to the provision, since the SCSI-ID speci- 
fied at issuance of an I/O request is transferred between 
the controllers, the user or the host system need not 
alter the I/O request issuing route. Moreover, while the 
host system (10, 20) does not recognize the error, the 
transfer can be conducted. 
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Description 

BACKGROUND OF THE INVENTION 



The present invention relates to a technology to 
guarantee high reliability in operation of a plurality of 
controllers for input/output (I/O) devices in a computer 
system, and in particular, to a method of redundantly 
arranging controllers capable of transferring a process 
therebetween without intervention of the user and host 
systems at occurrence of a failure in one of the control- 
lers in an external storage subsystem adopting a Small 
Computer Systems Interface (SCSI) in which the con- 
trollers are arranged at least in an duplicated configura- 
tion and the controllers can be accessed from the host 
systems. 

In a system configuration employing the SCSI in 
which a plurality of controllers and a storage shared 
between at least two controllers are connected by an 
interface cable in an daisy chain to the host systems, 
the plural controllers respectively have different port 
addresses such as SCSI-IDs. Ordinarily, these control-, 
lers process I/O requests designated according to perti- 
nent port addresses specified by the host systems- 
Described in the JP-A-4-364514 is a system in 
which the controllers are arranged in the multiplex con- 
figuration such that I/O requests from a hostapparatus 
to storages connected to the plural controllers are proc- 
essed at a high speed. In such a conventional system, 
at occurrence of a failure in one of the controllers, when 
the host system alters the specification of the controller 
to execute the I/O request, it is possible that the I/O 
request is processed by a normal controller. However, in 
a system in which the host system and the plural con- 
trollers are connected to each other in a daisy chain, 
considerations have not been given to a procedure in 
which at occurrence of a failure occurs in a controller, 
the process is transferred to a norma! controller tor the 
execution thereof without intervention of the host sys- 
tem. 

After issuing an I/O request to a controller, the host 
system ordinarily monitors termination of the I/O 
request by a timer in the host system. When the I/O is 
not terminated even when the monitor time predeter- 
mined by the host system lapses after the issuance of 
the I/O request, the host system assumes the state tem- 
porarily as an error. Conducting processes such as bus 
recovery process of an SCSI bus, the host system tries 
to issue again the same I/O request with specification of 
the port address of the failed controller. 

When the controller does not respond to the re- 
issued I/O request, the host system regards the state as 
a permanent error and hence does not thereafter issue 
any I/O request to the failed controller. At failure of a 
controller in the conventional system, when the host 
system once recognizes the permanent error, the data 
process thereof is interrupted. Therefore, even there are 
disposed a plurality of controllers, the user intervention 
is required to continuously execute the data process of 



the host system at failure of the pertinent controller. 

Furthermore, in a case in which there are disposed 
a plurality of host systems, when a controller fails and 
enters a hang-up situation with the bus kept occupied by 
5 the failed controller, another data process being exe- 
cuted between another host system and another con- , 
troller is also interrupted. The user intervention is also 
required to recover the interrupted data process. 
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SUMMARY OF THE INVENTION 



tt is therefore an object of the present invention^ 
provide a failure recovery method and system in which 
at occurrence of a failure in a controller, the process 
75 thereof is transferred to a normal controller to continu- 
ously achieve the data process without any intervention 
of the host system and user. 

Additionally, in a case in which the failed controller 
has not yet- received the I/O request from the host sys- 
20 tern and hence the error has not been assumed, it is 
necessary to possibly suppress I/O requests to the 
failed controller to prevent an abnormal operation. Con- 
sequently, in accordance with the present invention, the 
transfer of the port address and control information is 
25 executed after, suppressing an event in which the host 
systems issue I/O requests thereto. 

To achieve the -object above according to the ~ 
present invention, a normal controller has a function to 
receive control information of the failed controller and a 
30 function to reference the port address of the failed con- 
troller to add the contents thereof to the own port 
address. Furthermore, the normal controller possesses 
a function to reset the port address in the failed control- 
ler to thereby erase the port address. 
35 Thanks to these functions, the normal controller 
can receive the port address and control information of 
the failed controller and accepts and executes the I/O 
request issued to the failed controller. In the operation, 
there may be employed a method in which the port 
40 address is reset by the pertinent failed controller. 

Moreover, according to the present invention, there 
is disposed a function that the normal controller moni- 
tors a bus such as an SCSI bus at detection of the fail- 
ure to thereby decide whether or not the failed controller 
45 has already received .the I/O request from the host sys- 
tem. In a case in which the failed controller has already 
received the I/O request from the host system, the 
transfer of the port address and control information of 
the failed controller is terminated to prevent the host 
50 system from recognizing the permanent error so as to 
continue the process of, the host system without any 
intervention of the user and host system. 

In addition, when the normal controller is executing 
an I/O process at detection of a failure in a controller, it 
55 is assumed that the failed controller does not yet receive 
the I/O request from the host system. According to the 
present invention, there is provided a function to detect 
the condition such that the transfer of the port address 
and control information of the failed controller is accom- 
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plished during the I/O process execution of the normal 
controller. . 

As a result, I/O requests from the host system to the 
failed controller can be suppressed until the port 
address transfer process is completed. In addition, in a s 
case in which the bus such as the SCSI bus is not being 
used by any controller at detection of ;the* failure, it is 
considered the failed controller has not yet received the 
I/O request from the host system. According to the 
present invention, there is provided a function in which -io 
the condition is detected and the normal controller 
selects the failed controller* such that the: transfer of the 
port address and control information is executed after 
the selection is accomplished. Thanks to this function, 
I/O requests from the host system to the failed controller 15 
can be suppressed until the port address transfer proc- 
ess is completed. - : ■ -g ; ' * 

Owing to adoption of the construction of this type, in 
a situation in which a failed controller have received an. 
I/O request and the execution of the I/O process has not 20 
been terminated with a* bus such as an /SCSI bus kept * 
exclusively reserved by the failed controller,, a normal 
controller detects the state, completes reception of the 
port address and control information, and resets the 
failed controller within the I/O monitor time- of the host 25 
system. This makes it possible that any subsequent I/O 
requests to the failed controller can received for execu- 
tion thereof by the normal controller. Resultantly, the 
system can respond to the I/O request reissued from 
the host system and hence the interruption of the proc- 30 
ess of the^host system as weir as the inhibrtion of issu- 
ance of I/O requests from the host system Lean be 
prevented. * 

Moreover, at detection of a failure in a controller, the 
normal controller can suppress I/O requests from the 35 
host system to the failed controller. Therefore, in at case 
in which the failed controller has not yet received the I/O 
request, the host system need not recognize, the error 
and any subsequent I/O requests can be: received by 
the normal controller, thereby implementing the nonstop 40 
system operation. - 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects and advantages of the 45 
present invention will become apparent by reference to 
the following description and accompanying drawings 
wherein: 

Fig. 1 is a hardware configuration diagram showings ; so 
an embodiment of the present invention; 
Fig. 2 is a diagram of processing sequence of host 
system at failure of a controller in the embodiment 
of Fig. 1 ; 

Fig. 3 is a diagram briefly showing processed to be 55 
executed depending on states of the disk subsys- 
tem in the embodiment of Fig. 1; 
Fig. 4 is a flowchart of processing executed at 
detection of the controller failure, specifically, . 



processing executed when the SCSI bus is m the 
bus free state in the embodiment of Fig. 1 ; 
Fig. 5 is a flowchart of processing executed at 
detection of the controller failure, specifically, 
processing executed when the, bus is in use in the 
embodiment of Fig. 1;.* 

Fig. 6 is a hardware configuration diagram of 
another embodiment according to the -present 
invention; and <- 0 r 
Fig. 7 is a schematic diagram showing a method of 
implementing the SCSI-ID : Jransferm the configura- 
tion of the embodiment otFig. 6. * 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS < . 

Description will now be given in detail of an embod- 
iment according to the present invention. - 

In Fig..;-1 ( reference numerals 10 t and 20 indicate , 
host systemsi-as central processors to conduct data 
processing and a numeral- 70 denotes a disk array sub- 
system as a peripheral unit in a dual controller structure. 
In the constitution of the disk, array subsystem 70, a ; 
numeral 60 designates a standalone disk for storing 
therein data of the host systems, numerals 3a ad 40 are 
controllers to supervise data transfers between the host 
systems and the standalone disks, a numeral 50-stands 
for a. shared memory to transmit information between 
the controllers. A; reference numeral 71 indicates 
another peripheral unit including an . input/output (I/O) 
device 72 and a controller 73 to control the I/O device 
72. : ■ ■ 

The host systems 10 and 20 are connected via an 
SCSI cable to the controllers 30, 40, and 73- In the con- 
stitution of the controller 30, a numeral 3J indicates an 
SCSI port to control an SCSI bus on the host system 
side, a numeral 32 is a cache memory, a numeral 33 
denotes a device-side SCSI port to control the. SCSI bus 
connecting the standalone disks to the .controller 30, a, 
numeral 34 designates a microprocessor to control 
overall operations oitne controller 30, a numeral 35 is a 
port address resetting facility to reset theS.CSI port of 
the controller 40, a numeral 36 isa data transfer control- 
ler to. execute a data transfer between ( the host system 
10 and the cache memory 32, and a numeral 37 indi- 
cates an array data transfer cpntrpller to.execute a data 
transfer between the cache memory 32 and the stan- 
dalone disk 60. . 

The data transfer controller 36 has a function to 
write, when transferring data to the cache memory 2, 
the contents of data also in the cache memory 42 of the 
controller 40: In addition, the array data transfer control 
ler 37 possesses a function to generate redundant data 
for data buffered in the cache memory 32. This function 
can also be employed to restore data.- . 

The controllers 40 and 30 mutually have the same 
configuration. Specifically, for each a constituent ele- 
ment of the controller 30. a reference number obtained 
by adding ten to the reference number of the constituent 
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element indicates a partner or associated constituent 
element in the controller 40. The port address, resetting 
facility 45 can reset the SCSI port 31 of the controller 
30 The port address resetting facilities 35 and 45 reset 
port addresses., i.e., SCSI-IDs preserved by the the 
SCSI ports 41 and 31 in the respective controllers 40 
and 30. According to the SCSI standards, the SCSI-IDs 
can be erased in the next arbitration phase. 

In addition, since the data transfer controller 36 has 
a function to write data in the cache memory 32. any 
data items transferred from the host systems 10 and 20 
are duplicatedly buffered in the respective cache mem- 
ories 32 and 42. Thanks to the provision, even when a 
failure occurs in one of the controllers, the remaining 
controller can receive the process of the failed controller 
to execute the process using its own data in-the cache 
memory. 

The I/O process flow will be described according to 
an example in which the host system 1 0. achieves a data 
transfer via the controller 30. The system 10 issues an 
I/O request with an- SCSI-ID designating the controller 
30 In the controller 30. the SCSI port 31 keeping 
therein the SCSI-ID receives the I/O request and then 
passes the request to the microprocessor 34. The 
microprocessor 34 analyzes the I/O request and then 
instructs the data transfer controller 36 to execute a 
data transfer between the system 1 0 and the disk 60- 
The transfer data is provisionally buffered in the 
cache memory 32 and is then written also in the cache 
memory 42 for a possible failure in the controller 30. In 
this connection, the SCSI-ID is set by the microproces- 
sor 34 at initialization of the SCSI port 31 , for example, 
when the system is powered. The SCSI-ID is saved in 
the shared memory 50 at the same time. Stored in the 
shared memory 50 is also control information so that 
the process can be continuously executed by a normal 
controller when one of the controller system fails in the 
dual controller configuration. 

Referring now to the process sequence of the host 
system at failure of the controller shown in Fig. 2, 
description will be given of a method of continuing an 
I/O operation of the host system according to the 
present invention. 

First, the internal construction of the host system 
will be described. In^Fig. 2, a numeral 81 is an applica- 
tion program for executing data processing to achieve 
various requests from the user, a numeral 82 denotes a 
file system for keeping therein data structure and con- 
trolling I/O requests, a numeral 83 indicates a device 
driver for converting an I/O request into a request mode 
suitable for a peripheral unit, a numeral 84 stands for an 
SCSI card for transmitting an I/O request to the SCSI 
bus a numeral 85 is a transfer I/O buffer, and a numeral 
86 designates a system log in which failure information 
of the host systems is accumulated. 

Next description will be generally given of the 
processing of the host system 10 when a failure occurs 
in the controller 30 of the disk subsystem. Receiving an 
I/O request occurring in the application 81 , the file sys- 



tem 82 issues an I/O request to the SCSI bus 80 via the 
device driver 83 and SCSI card 84. On receiving the 
request, when the controller 30 detects a failure in the 
disk subsystem, the controller 30 reports Check Condi- 
5 tion for the I/O request. 

Next the device driver 83 issues a Request Sense 
command to receive Sense Data which is detailed fail- 
ure information. According to the Sense Data, the 
device driver 83 recognizes the state of the controller 
10 30 As a result, the driver 83 issues again (retries) the 
same I/O request.- Since the failed controller 30 cannot 
either execute the re-issued I/O request, the device 
driver 83 instructs an operation to discard the process 
associated with the I/O request and repeats the opera- 
is tion, for example, by Retry after, an Abort message. 
After this operation, the driver 83 recognizes the state 
as a permanent error to notify the condition to the file 

system 82. , ., 

Receiving the- permanent error report, the file sys- ■ 
20 tern 82 does not thereafter issue any I/O request to the 
disk subsystem 70. The file system 82 then erases non- 
reflection data of the I/O buffer 85 and records a failure 
occurrence in the system log. and then sends an error 
message via the application 81 to the user. Conse- 
25 quently, the integrity of updated data cannot be pre- , 
served between the application 81, file system 2, and 
disk subsystem depending on cases. In consequence, 
in any case to which the present invention is not applied . 
the user is required to once stop the application and the 
30 like to restore the disk subsystem so as to thereafter 
execute again a sequence of processes possibly having 
caused the mismatching of data in the host system. 

As another example of general processing, there 
exists a case in which the controller 30 can not report 
35 Check Condition to the device driver 83 even at failure. 
Namely, the controller 30 does not notify the occurrence 
of the failure to the device driver 83. On this occasion, 
the device driver 83 checks the state of the disk subsys- 
tem by monitoring the state according to a fixed period 
40 of time indicated by a timer. When the response is not 
received within the fixed period of time, the device driver 
conducts, like in the example above, the process begin- 
ning at the re-issuance (retry) of the same I/O request. 
Referring to Fig. 1 , description will be given of an 
45 advantageous feature in which the I/O process can be 
continued without conducting the user operation in 
accordance with the present invention. The controllers 
30 and 40 update monitor information items of the 
respective controllers in the shared memory 50 at a 
so fixed interval of time; moreover, mutually reference 
monitor information thereof. : 

In a case in which the controllers 30 and 40 are 
respectively receiving I/O requests issued respectively 
from the host systems 10 and 20. when a failure occurs 
55 in the controller 30. the monitor information of the con- 
troller 30. in the shared memory 50 is updated by the 
controller 30 to information indicating the failure. Or. the 
information is not updated even when a fixed period of 
time lapses. Referencing the monitor information in the 
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shared memory 50, the controller 40 detects the failure 
of the controller 30, reads the SCSI-ID of the SCSI port 
31 and control information of the controller 30 from the 
shared memory 50, and adds by the microprocessor 44 
the SCSI-ID of the SCSI port 31 to the SCSI port 41 . 5 

Additionally, using the SCSI port resetting facility 
45, the controller 40 erases the SCSI-ID possessed by 
the SCSI port 31. Thjs enables the SCSI port 41 to 
accept an I/O request issued from the host system 20 
and one issued from the host system 1 0 so that the retry 10 
of the host system 10 is received for execution thereof 
by the controller 40. r - 

When the retry is normafly executed, abnormal exe- 
cution of the I/O request is reported to the file system 82, 
and the processing of the host system 10 is- normally is 
continued. The control information includes; transit infor- 
mation in relation to transfers of data from the -cache 
memories 32 and 42 to standalone disks. Conse- 
quently, receiving the control information, the controller 
40 can transfer, in place of the controller 30, the dupli-, 20 
cated data written in the cache memory. 42 as alterna- 
tive data of the Write data kept remainedias non- - 
reflection data in the cache memory 32. ;* 

Since the method of failure detection and control 
information transfer of the controller 30 is not tbeinher- 25 
ent characteristic of the present invention and has 
already been-described in detail in the Japanese Patent 
Application No. 7-139781 (filed on June7, 1995) by the 
applicant of the priesent invention and hence description 
thereof will be avoided. so 

For the transfer by the controller 40 of the SCSI-ID 
of the SCSI port 31 to the SCSI port 41 and the^ransfer 
of control information of the controller 30 to the control- 
ler 40 described above/the associated processing is 
required to be appropriately accomplished according to 35 
the state of the controller 30. Otherwise, the transfers 
cannot be correctly carried out. According to the * 
present invention, the status of the failed controller 30 ; i 
more specifically, the state of reception by the failed 
controller 30 of the I/O request from the host.system is ao 
decided on the basis of the usage state (signal state) of 
the SCSI bus. 

In the following examples, description will be given 
of a case in which a failure takes place in the controller 
30 of Fig. 1 and the processls continued by the normal 45 
controller 40. 

Referring next to Fig. 3, description will be given of 
a process to be executed according to the state of the 
disk subsystem. 

In general, it is difficult to completely forecast oper- so 
ationtobe achieved by th e failed controller when an I/O 
request is received from the host system. Therefore, in 
a case in which the failed controller 30 did not yet . 
receive the I/O request from the host system 10 when 
the failure of the controller 30 is detected by the control- ss 
ler 40, the transfer process of the SCSI-ID including the 
addition of the SCSI-ID -to the SCSI port 41 and the . 
resetting of the SCSI port 31 is executed as early as 
possible so that the controller 40 receives the I/O 
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request .-. 

However,-- When an I/O request is issued from the 
host system^ 10'with specification of the SCSI-ID during 
the transfer process of the SCSI-ID, the controllers 30 
and 40 possess the same SCSI-ID and hence the oper- 
ation of the SCSI bus becomes. unstable. In this situa- 
tion, according ' to the present invention, there is 
provided a method in which the SCSI bus 80 is dedicat- 
edly occupied by one controller during; the SCSI-ID 
transfer process so as to suppress the I/O request issu- 
ance from the host system 10. 

In accordance with : the present invention, the con- 
troller 40 monitors the utilization status (signal state) of 
the SCSI bus 80 to decide whether or not the controller 
30 has already received the I/O request from the host 
system 10, thereby executing a process associated with 
the decision. 

In one of the utilization statuses of the SCSI bus 80, 
the SCSI bus 80 is possibly in the bus free state when a 
failure is detected in the controller 30. In thjs case, the 
SCSI bus 80 is possibly in the bus free state. Since the 
controller 30 has not received yet the I/O request, the 
controller 40; executes a host operation (the initiator 
operation) such that the controller 40 selects the con- 
troller 30 to exclusively occupy, the SCSI bus 80, This 
makes it possible to suppress the issuance of an I/O 
request from the host system 10 such that the controller 
40 conduct the transfer of the SCSI-ID during this 
period. 

In one of the utilization statuses of the SCSI bus 80, 
it may be possible that the controller 40 is executing an 
I/O process through the SCSI bus 80 when a failure is 
detected in the controller 30. In this situation, it may be 
possible that the controller 40 is executing an I/O proc- 
ess through the SCSI bus 80. On this occasion, the con- : 
troller 30 has not received the I/O request and hence 
the SCSI-bus 80 is set to the bus free. state at termina- 
tion of the I/O process and an I/O request may possibly 
be issued from. the host system 10. To overcome this dif- 
ficulty, the controller 40 completely executes also the 
SCSI-ID transfer during the execution otthe pertinent 
I/O process. If the SCSI-ID transfer is not completed 
during the execution of the pertinent I/O, the controller 
40 does not send the report of the I/O termination status 
until the ID transfer is completely finished. 

In one of the utilization statuses of the SCSI bus 80, : 
the SCSI bus is possible being used when a failure is. . 
detected :n the controller 30. In this.case, the system is 
in a state in which the arbitration or selection is being 
executed according to the SCSI standards, a state in 
which another SCSI device connected to the SCSI bus 
80 is using the SCSI bus 80, or a state in which the con- 
troller 30 has already received the I/O request from the 
host system 10. 

In this situation, the controller 40 monitors the BSY 
signal of the SCSI bus 80. In association with the moni- 
tor period, when the BSY signal continues for a period 
of time equal to or more than the period of time in which 
the arbitration phase is changed via the selection phase 
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to the message out phase according to the SGSI stand- 
ards, it can be decided that the signal is the BSY signal 
indicating an I/O process in execution, not the BSY sig- 
nal of the bus mastership arbitration. After the signal 
decision, the controller 40 executes the SCSI-ID trans- 
fer process at a high speeds 

If another SCSI device is using the SCSI bus 80, 
the controller 30 has not received the I/O request. 
Therefore, the controller 40 achieves the transfer proc- 
ess at a high speed while another SCSI device is using 
the SCSI bus 80. 

If the controller 30 has already received the I/O 
request from the host system 10, the failed controller 30 
has already stopped its operation with the SCSI bus 80 
exclusively possessed by the controller 30. Since the 
device driver 83 is monitoring the I/O operation by the 
internal timer, the controller 40 is required to execute 
the SCSI-ID transfer before the host system 10 con- 
ducts the Bus Reset and Retry so that the controller 40 
responds to the Retry. The monitor period of the control- 
ler 40 to monitor the SCSI bus 80 is fully shorter than 
the I/O process monitor period of the host system 10. 
Consequently, the controller 40 is required to com- 
pletely achieve the SCSI-ID transfer prior to the bus 
resetting indication from the host system. This can be 
satisfactorily achieved thanks to the provision above. 

Referring to Figs. 4 and 5, description will begiven 
of a procedure to acquire the state of the disk subsys- 
tem by monitoring the SCSI bus and an associated pro- 
cedure of transferring the SCSI-ID. 

Description will be given of a case in which the 
SCSI bus 80 is in the bus free state when a failure of the 
controller 30 is detected by the controller 40 in Fig. 4. 

Since the SCSI bus 80 is in the bus free state (step 
400) the controller 40 recognizes that the controller 30 
has not yet received the I/O request from the host sys- 
tem 10. The controller 40 then instructs the SCSI port 
41 to start the initiator operation to participate in the 
arbitration of the SCSI bus 80 (step 401). 

As a result, when the controller 40 remains in the 
arbitration (yes in step 402), the controller 40 specifies 
in the selection phase the SCSI-ID of the SCSI port 31 
of the failed controller 30. In this situation, even if a fail- 
ure occurs in the controller 30, the SCSI port 31 nor- 
mally functions in most cases/Consequently, there is 
set a state in which the SCSI port 31 of the controller 30 
exclusively occupies the SCSI bus 80 (step 404). In this 
state, the controller 40 adds the SCSI-ID possessed by 
the SCSI port 31 to the SCSI port 41 (step 405) and 
then resets the SCSI port 31 (step 406). The SCSI bus 
81 exclusively occupied by the controller 30 is released 
by resetting the SCSI port 31 and is returned to the bus 
free state. Thereafter, the controller 40 receives the I/O 
request from the host system 10 (step 413). The I/O 
process can be continued in this way without any inter- 
vention of the user. 

When the controller 40 cannot remain in the arbitra- 
tion (no in step 402), it is decided whether or not the 
controller 40 is selected by the host system 20 in the 



selection phase (step 403). If. the controller 40 is 
selected by the host system 20 (yes in step 403), there 
is set a state in which the controller 40 dedicatedly 
occupies the SCSI bus 80. In this state, the controller 40 
5 receives the I/O request from the host system 20 (step 
407) and then provisionally interrupts the processing. 
The controller 40 adds the SCSI-ID possessed by the 
SCSI port 31 to the SCSI port 41 (step 408) and then 
resets the SCSI port 31 (step 409). After resetting the 
10 port 31 , the controller 40 executes the. I/O request from, 
the host system 20 (step 410) and then restores the 
SCSI bus 80 to the bus free state. After this point, the 
controller 40 receives the I/O request from the host sys- 
tem 10 (step 41 3), 
15 K the controller does not remain in the arbitration 
(no in step 402) and is not selected by the host system 
20 (no in step 403). the controller 40 assumes a state in 
which the controller 30 having received the I/O. request 
from the host system 10 or another SCSI device ded.- 
20 catedly occupies the SCSI bus 80. In this situation, 
while the state is kept unchanged, the controller 40 
adds the SCSI-ID possessed by the SCSI port 31 (step 
41 1 ) to the SCSI port 41 and then resets the SCSI port 
31 (step 412). If the controller 30 exclusively occupies 
25 the SCSI bus 80, the SCSI bus 80 is restored to the bus 
free state by resetting the SCSI port 31 . If another SCSI 
device dedicatedly occupies the SCSI bus 80, the SCSI 
bus 80 is restored to the bus free state when the I/O 
process of the SCSI device is terminated. Thereafter, 
30 the controller 40 accepts the I/O request from the host 
system 10 (step 413). 11L 

Referring next to Fig. 5, description will be given of 
a processing procedure in a case in which the BSY sig- 
nal of the;SCSI bus 80 is asserted at detection of the 
35 failure of the controller 30 (step 500). 

The controller 40 first determines whether or not 
the controller 40 is executing an I/O request from the 
host system 20 (step 501), If this is not the case (no in 
step 501), the controller 40 continuously monitors the 
40 state of the SCSI bus 80 for a period of time equivalent 
to the period in which the arbitration phase according to 
the SCSI standards is changed via the selection phase 
to the message out phase (step 502). 

At detection of the failure, if the controller 40 is exe- 
45 cuting an I/O operation (yes in step 501 ) or the control- 
ler 40 is selected by the host system 20 during the 
monitor operation of the SCSI bus 80 (left branch in step 
502), there is assumed a state in which the SCSI bus 80 
is exclusively occupied by the controller 40 and the con- 
so troller 30 has not received the I/O request In this state, 
prior to reporting the termination status of the I/O execu- 
tion (step 503), the controller 40 adds the SCSI-ID pos- 
sessed by the SCSI port 31 to the SCSI port 41 (step 
504) and then resets the SCSI port 31 (step 505). After 
55 resetting the port 31. the controller 40 notifies the I/O 
termination status and then terminates the I/O operation 
(step 506). 

The SCSI bus. 80 isset to the bus free state when, 
the I/O execution process is terminated, and the control- 
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ler 40 receives any subsequent I/O request from the 
host system 10. In this fashion, it is possible to continu- 
ously execute the I/O process without user intervention. 

When the bus free state is detected during the mon- 
itor operation of the SCSI bus 80 (central branch in step s 
502), the process at bus free detection of Fig. 4 is exe- 
cuted. 

If the controller 40 is not executing an I/O operation 
and the SCSI bus 80 is not released during the monitor 
operation (right branch in step 502), the controller 40 io 
recognizes that the controller 30 or another SCSI device 
exclusively occupying the SCSI bus is executing an I/O 
operation. Continuing the SCSI bus monitoring opera- 
tion (step 508), the controller 40 adds the SCSI-ID pos- 
sessed by the SCSI port 31 to the SCSI port 41 (step is 
509) and then resets the SCSI port 31 (step 510). 

When the controller 30 exclusively Occupies the 
SCSI bus 80, the bus 80 is returned to the bus free state 
by resetting the SCSI port 31. WhSri another- SCSI 
device exclusively occupies the SCSI bus 80, the bus 80 20 
is returned to the bus free state when the I/O operation 1 
of the SCSI device is terminated. Thereafter^ the con- 
troller 40 receives the I/O request from the host system 
10. If the bus is released before the SCSI i^ort 31 is 
completely reset (broken line in step 508), there is exe- 25 
cuted the process at detection of the bus free state - 
shown in Fig. 4. v : ; o 

As a result of the processing procedure, the I/O 
requestlfrom the host system 10 can be executed by the 
controller 40 when a failure occurs in the controller 30, 30 
therebyipreventing the permanent error. Consequently, 
the data processing of the system 10 can be normally 
continued. ; . > 

Referring next to Figs. 6 and 7, description will be 
given that the present invention can be implemented in 35 
a configuration of the controller not including the port 
address resetting facility. 

Fig. 6 is adiagram showing the configuration devel- 
oped by removing the port address resetting facility 
from the controller of Fig. 1 . Numerals 90 arid 1 00 indi- 40 
cate controllers respectively conducting functions of the 
controllers 30 and 40 of Fig. .1 and a numeral 50 indi- 
cates a shared memory to supply information between 
the controller 90 and 100. 

In an internal constitution of the controller 90, a 45 
numeral 34 is a microprocessor controlling overall oper- 
ation of the controllers, a numeral 31 indicates an SCSI 
port which can be controlled only by the microprocessor 
34, a numeral 32 denotes a cache memory, a numeral 
33 stands for a device-side SCSI port, a numeral 36 so 
designates a data transfer controller, and a numeral 37 
is an array data transfer controller. The controllers 100 
and 90 are of the same configuration. In the following 
paragraphs, description will be given of an example in 
which the controller 90 receives an I/O request from the 55 
host system 10 of Fig. 1 and the controller 100 receives 
an I/O request from the host system of Fig. 1 . Fig. 7 is a 
diagram showing an SCSI-ID transfer processing proce- 
dure with its abscissa representing lapse of time. 



Wherr a failure occurs in the controller 90, the con- 
troller 100 detects the failure and then sets at a particu- 
lar address in the shared memory 50 a failure flag 
indicating the -occurrence of the failure in the controller 
90. Thereafter, the controller 100 reads the SCSI-ID of 
the SCSI port 31 and control information of the control- 
ler 90 from the^shared memory 50 r and adds by the 
microprocessor 44 the SCSMD tOrthe SCSI port 41. In 
contrast thereto, the controller 90 recognizes its own 
failure according to the failure flag in the shared mem- 
ory 50 and enters a wait state in which by use of an 
internal timer, the controller does not execute its opera- 
tion for a period of time equivalent to the period of time 
in which the transfer processing of the controller 100 is 
completely executed. 

The controller 90 determines Jhrough the wait oper- 
ation the completion of the processing of the controller 
100 and then erases by the microprocessor 34 the 
SCSMD possessed by the SCSI port 31. As a result, 
the SCSI-ID transf en process is terminated and then the 
SCSI port 41 is enabled to receive the I/O request from 
the host system 10 of Fig. T. 

Since the SCSI-ID process can be conducted with- 
out using the, port address resetting facility as above, 
the present invention is* effective also in the configura-, 
tion not including the port address resetting facility. It is 
to be assumed that also in a case in which a failure 
occurs in the controller 90, the microprocessor 34 and 
SCSI port 31 function normally. 

While the present invention has been described 
with reference to the particular illustrative embodiments, 
it is not to be restricted by those -embodiments but only 
by the appended claims.. It is to be appreciated that 
those skilled in the art can.change or modify the embod- 
iments without departing from the scope and spirit of the 
present invention. 

Claims - 

1 . A failure recovery method for use in ta data process- 
ing system including at least, one hpst system (10, 
20), a plurality of controllers (30. 40y 73; 90, 100); 
and an interface cable (80) connecting .said host 
system to said controllers in a daisy chain, said 
controllers^ respectively including therein I/O ports. 
(31, 41) being connected to said interface cable and 
having, mutually different IDs (SCSI-IDs), an I/O 
device being controlled by a group of at least two 
controllers (30, 40; 90, 100), theiinethod compris- 
ing the .steps of: ; • 

detecting, when a failure is detected in a con- 
troller (30; 90) of said group, a utilization state 
of said interface cable by a controller (40;. 100) 
as a substitutive unit of a failed controller (39. 
90) of said group; 

deciding, according to the utilization state of 
said interface cable, a state of reception by said 
failed controller of an I/O request from said host 
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System; ■ • 
suppressing by a substitutive controller, when 
the I/O request is not yet received by said failed 
controller as a result of the decision, reception 
of the* I/O request by said failed controller; add- 
ing an ID of an I/O port (31) related to said 
failed controller to an I/O port (41 ) of said sub- 
stitutive controller; and resetting the I/O port 
related to said failed controller; and 
adding by said substitutive controller, when the 
I/O request is already received by said failed 
controller as a result of the decision, the ID of 
said I/O port related to said failed controller to 
the I/O port of said substitutive controller and 
resetting the I/O port related to said failed con- 
troller before said host system recognizes a 
permanent error in said failed controller. 

>. A failure recovery method according to Claim 1 , 
wherein, in resetting the I/O port related to said 
failed controller, reset is carried out by hardware 
resetting means (45) in said substitutive controller. 

3. A failure recovery method according to Claim 1 , 
wherein, in resetting the I/O port related to said 
failed controller, said substitutive controller further 
includes the steps of: 

indicating to said failed controller to reset the 
I/O port related to said failed controller after 
lapse of a predetermined period of time; and 
- adding the ID of the I/O portion related to said 
failed controller to the I/O port of said substitu- 
tive controller within said predetermined period 
of time. 

4. A failure recovery method according to Ciairn 1, 
wherein said interface cable is a Small Computer 
Systems Interface (SCSI) bus cable. 

5. A data processing system, comprising: 
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, at least one host system (10, 20); 
•a plurality of controllers (30, 40, 73; 90, 100); 
and 

an interface cable (80) connecting said host 
system to said controllers in a daisy chain, said 
controllers respectively including therein I/O 
> ports (31 , 41) being connected to said interface 
cable and having mutually different IDs (SCSI- 
IDs); 

an I/O device being commonly controlled by a 
group of at least two controllers (30, 40; 90, 
100); and 

a shared memory (50) being commonly 
accessed from said group, each of controllers 
in said group including a microprocessor, 
the microprocessor in each of said controllers 
including: 



45 



50 



55 



14 

means (44) for detecting a failure in a controller 
(30, 90) of said group according to contents of 
said shared memory; 

means (44) for detecting a utilization state of 
said interface cable via an I/O port; 
means (44) for deciding, according to the utili- 
zation state of said interface cable, a state of 
reception by said failed controller of an I/O 
request from said host system; 
means (44) for suppressing, when the I/O 
- request is not yet received by said failed con- 
troller as a result of the decision, reception of 
the I/O request by said failed controller ; adding 
an ID of the I/O port (31) related to said failed 
controUer to an I/O port (41) of a controller of its 
own; and indicating to reset the I/O port related 
to said failed controller; and 
means for adding, when the I/O request is 
already received by said failed controller as a 
result of the decision, the ID of the I/O port 
related to said failed controller to the I/O port of 
the controller of its own; and indicating to reset 
the I/O port related to said failed controller 
before said host system recognizes a perma- 
nent error in said failed controller. 

-A data processing system according to Claim 5, 
wherein each of the controllers of said group 
includes hardware resetting means (45) responsive 
to an indication from said reset indicating means for 
resetting the I/O port related to said failed control- 
ler. 

A data processing system according to Claim 5, 
wherein: 

said reset indicating means writes a failure flag 
at a predetermined address in said shared 
memory, said flag indicating an occurrence of a 
failure; 

a processor in said failed controller functions 
: as means for reading said failure flag from said 
shared memory and resetting the I/O port 
related thereto after lapse of a predetermined 
period of time; and 

said reset indicating means adds the ID of the 
I/O port related to said failed controller to the 
I/O port related to own controller within said 
predetermined period of time. 

A data processing system according to Claim 5, 
wherein said interface cable is an SCSI bus cable. 

. An external storage for use in a data processing 
system including a host system (10, 20), an exter- 
nal storage (70) including a plurality of controllers 
(30, 40, 73; 90, 100) respectively having therein 
.ports possessing identifiers (IDs) as individual port 
addresses and a group of storages (60) controlled 
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by and shared between said plural controllers, and 
an interface cable (80) connecting in a daisy chain 
said host system to said plural controllers having 
the ports therein, said plural controllers and. stor- 
ages being accessible from said host system, 5 

said external storage having a function that 
at occurrence of a failure in a controller excepting at 
least one controller, a normal controller detects the 
failure, references a port address of a^failed control- 
ler, receives control information of said faHed con- w 
trailer, and adds control information to the port 
address thereof . - 

10. An external storage according to claim 9, further 
including a shared memory~(50) for each of said is 
plural controllers for - storing therein , the port 
address and control information ofeach of said 
controllers and thereby transmitting information 
between said controllers- • : - \ h 

/ - ■ ■ 20 

11. An external storage in a data : processing system 
including host system (10, 20), an external storage 
(70) including a plurality of controllers (30, 40, 73; 
90, 100) respectively having therein ports possess- 
ing identifiers (IDs) as individual port addresses 25 
and a group of storages (60) controlled by and 
shared between said plural controllers, and an 
interface cable (80) connecting in a daisy chain said 
host system to said plural controllers' having the 
ports therein, said plural controllers and storages 30 
being accessible from said host system, 

said external storage having a function that 
at occurrence of a failure in a controller excepting at 
least one controller, a normal controller detects the 
failure, references a port address of a failed control- 35 
ler, receives control information of said failed con- 
troller, and adds the control information to the port 
address thereof , * 

a controller having a port address resetting 
facility (45) for resetting the port address of said 40 
failed controller and erasing-an ID thereof in such a 
manner that the controller resets the port address 
of said failed controller, that said failed controller 
does not respond to subsequent I/O requests from 
said host system, and that said normal controller 45 
having received the port address responds to the 
I/O requests. 1 >- 

12. An external storage according to claim 11 , wherein, 

at occurrence of the failure in the controller, in a so 
state in which said host system has not executed an 
I/O request to said failed controller and said inter- 
face cable connecting said host system to said con- 
trollers is not being used, 

a normal controller executes selection for 55 
said failed controller to acquire a bus mastership 
between said normal controller and said failed con- 
troller, thereby: suppressing : issuance, of an I/O 
request from said host system to said failed control- 



ler during a transfer process of the port address by 
said normal pontroller. 

13. An external storage according to claim 1 1 , wherein, 
at occurrence of the failure: in the controller, in a 
state in which said host system has not executed an 
I/O request to said failed controller and said normal 
controller is using the bus, said normal controller 
completes the transfer process of the port address 
of said failed controller during Jhe processing of the 
I/O request issued from said host system and then 
notif ies termination of the 1/O.request, thereby sup- 
pressing issuance of an-l/O request from said host 
system to said failed controller during the transfer 
process of the port address by said normal control- 
ler. 

14. An external storage according to claim 1 1 , wherein; 

said interface cable is an SCSI cable; 
said normal controller monitors, when the bus 
is in use at occurrence of the failure in the con- 
troller, a BSY signal of the bus to determine 
whether or not the bus is being used by another 
, device connected to the bus. whether or not the 
system is in a transit state from an arbitration 
phase to a selection phase according to the 
SCSI standards, and whether or not said failed 
controller already received an I/O request from 
said host system, 

said normal controller executes, when the bus 
is released during the monitor operation, selec- 
tion for said failed controller to attajn a bus 
mastership between said normal and; failed 
controllers, 

said normal controller completes, when said 
normal controller is selected during the monitor 
operation, the transfer process of the port 
address of said failed controller during the 
processing of the I/O request issued from said 
host system and then notifies termination of the 
I/O request, and 

said normal controller terminates during the 
monitoring period the transfer process of the 
port address of said failed controller. 

15. An external storage according to claim 14, wherein 
the monitoring period of the bus mastership is set to 
be equal to or more than a period, of time in which 
the arbitration phase is changed via the selection 
phase to a message out phase according to the 
SCSI standards so as to confirm that the BSY sig- 
nal is not associated with arbitration of the bus mas- 
tership but is caused by an I/O execution process, 
thereby executing the transfer of the port address of 
said failed controller. 

16. An external storage in a data processing system 
including a host system (10, 20), an external stor- 
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age (70) including a plurality of controllers (30, 40, 
73; 90, 100) respectively having therein ports pos- 
sessing identifiers (IDs) as individual port 
addresses and a group of storages (60) controlled 
by and shared between said plural controllers, and 5 
an interface cable (80) connecting in a daisy chain 
said host system to said plural controllers having 
the ports therein, said plural controllers and stor- 
ages being accessible from said host system, 
wherein: 70 

at occurrence of a failure in a controller except- 
ing at least one controller, a failed controller 
recognizes the failure thereof and enters a wait 
state without executing a control operation 15 
thereof in at least a period of time equal to time 
in which said normal controller conducts a 
transfer process of control information of said 
failed controller and addition of a port address; 
after said normal controller which recognized 20 
the failure finishes the transfer and addition 
processes, said failed controller erases the port 
address of said failed controller; and 
said normal controller which received the port 
address of said failed controller responds to a 25 
subsequent I/O request issued from said host 
system since the port address of said failed 
controller is already erased. 

17. An external storage according to claim 16, wherein 30 

at occurrence of the failure in the controller, 
in a state in which said host system has not exe- 
cuted an I/O request to said failed controller and 
said interface cable connecting said host systems 
to said controllers is not being used, said normal 35 
controller executes selection for said failed control- 
ler to acquire a bus mastership between said nor- 
mal controller and said failed controller, thereby 
suppressing issuance of an I/O request from said 
host system to said failed controller during the 40 
transfer process of the port address by said normal 
controller. 

18. An external storage according to claim 16, wherein, 

at occurrence of the failure in a controller, in a state 45 
in which a host system has not executed an I/O 
request to said failed controller and said normal 
controller is using the bus, said normal controller 
completes the transfer process of the port address 
of said failed controller during the processing of the so 
I/O request issued from said host system and then 
notifies termination of the I/O request, thereby sup- 
pressing issuance of an I/O request from said host 
system to said failed controller during the transfer 
process of the port address by said normal control- ss 
ler. 

1 9. An external storage according to claim 1 6, wherein: 



when the bus is in use at occurrence of the fail- 
ure in the controller, said normal controller 

- monitors a BSY signal of the bus to determine 
; whether or not the bus is being used by another 

device connected to the bus, whether or not the 
system is in a transit state from an arbitration 

- - phase. to a selection phase according to the 

SCSI standards, and whether or not said failed 
controller already received the I/O request from 
said host system; 

when the bus is released during the monitor 
operation, the normal controller executes 
selection for said failed controller to attain a bus 
mastership between said normal and failed 
controllers; 

when said normal controller is selected during 
the monitor operation, said normal controller 
completes the transfer process of the port 
address of said failed controller during the 
processing of the I/O request issued from said 
host system and then notifies the termination of 
the I/O request; and 

said normal controller terminates during the 
monitoring period the transfer process of the 
port address of said failed controller. 

20. An external storage according to claim 16, wherein 
the monitoring period of the bus mastership is set to 
be equal to or more than a period of time in which 
the arbitration phase changes via the selection 
phase to a message out phase so as to confirm that 
the BSY signal is not associated with arbitration of 
the bus mastership but is caused by an I/O execu- 
tion process, thereby executing the transfer of the 
port address of said failed controller. 

21. A host system and an external storage connected 
by an interface cable in a configuration including a 
host system, an external storage including a plural- 
ity of controllers respectively having therein ports 
possessing identifiers (IDs) as individual port 
addresses and a group of storages controlled by 
and shared between said plural controllers, and an 
interface cable connecting in a daisy chain said 
host system to said plural controllers having the 
ports therein, said plural controllers and said stor- 
ages being accessible from said host system, 

said external storage having a function that at 
occurrence of a failure in a controller excepting 
at least one controller, said normal controller 
detects the failure, references the port address 
of the failed controller, receives control informa- 
tion of said failed controller, and adds the con- 
trol information to the port address thereof, 
said host system having a function that in a 
state in which a controller having received an 
I/O request issued from the host system cannot 
respond thereto due to occurrence of a failure 
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*= in the controller, said host system monitors an 
I/O completion report from the controller, 
issues again the I/O request to said failed con- 
troller after lapse of the predetermined monitor- 
ing period, executes a recovery process 5 
including a resetting operation, recognizes a 
permanent error when the controller does not 
respond to the recovery process, and notifies 
the error to the application, and : 
said normal controller completing an operation 10 
including the reference, transfer, and additional 
port address processes before the permanent 
error is recognized, thereby preventing a report 
of the permanent error to an application of said 
host system. 15 
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FIG.2 
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FIG.3 
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FIG.4 

SCSI-ID TRANSFER PROCESS AT DETECTION OF BUS FREE STATE 
(CONTROLLER 30 OF FIG.1- IS IN FAILURE) 
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FIG.5 

SCSI-ID TRANSFER PROCESS AT DETECTION OF BUS BUSY STATE 
(CONTROLLER 30 OF FIG.1 IS IN FAILURE) 
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FIG.7 



SCSI-ID TRANSFER IN CONFIGURATION NOT 
HAVING PORT ADDRESS RESETTING FACILITY 
(EXEMPLE OF I/O TRANSFER BY CONTROLLER 
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